Diffusion trainer fix: shift logits to align with input tokens #3191

djsaunde · 2025-09-29T15:45:25Z

Description

Title.

Motivation and Context

Pretrained autoregressive models treat the output logits as right-shifted by one. By doing this, we should be able to use pretrained AR models effectively for diffusion model fine-tuning!

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Bug Fixes
- Corrects token alignment during diffusion generation and training so predictions match the corresponding input positions. This improves sampling stability and reduces artifacts in outputs.
- Enhances loss calculation accuracy by aligning model predictions with inputs, leading to more consistent training behavior and better-quality results during inference.

coderabbitai · 2025-09-29T15:45:33Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Introduces a new utility to shift logits to input positions and updates diffusion generation and training to apply this shift before token selection and loss computation. No public API signatures changed.

Changes

Cohort / File(s)	Summary
Logits alignment integration `src/axolotl/integrations/diffusion/generation.py`, `src/axolotl/integrations/diffusion/trainer.py`	Import and apply shift_logits_to_input_positions to outputs.logits before sampling (generation) and before loss computation (trainer), replacing direct logits usage.
Utility addition `src/axolotl/integrations/diffusion/utils.py`	Add shift_logits_to_input_positions(logits) function to realign next-token logits to input token positions by shifting along the sequence dimension.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the primary change of shifting output logits to align with input tokens in the diffusion trainer. It is concise and clear, avoiding unnecessary details like file lists or emojis, and directly communicates the key fix introduced by this PR. Although the generation logic is also updated, the central focus on training behavior makes the title sufficiently representative of the main change.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/axolotl/integrations/diffusion/utils.py (1)
162-166: Implementation looks correct for logits alignment.

The function correctly shifts next-token prediction logits to align with input token positions by:

Preserving the first logit position unchanged

Shifting remaining logits left by one position

Properly handling edge case of single-token sequences

The implementation aligns with the PR objective of adapting pretrained autoregressive models for diffusion fine-tuning.

However, consider adding a brief example in the docstring to clarify the transformation:
-    """Align next-token logits with their input token positions for diffusion."""
+    """Align next-token logits with their input token positions for diffusion.
+    
+    Example: [logit_for_pos1, logit_for_pos2, logit_for_pos3] 
+    becomes: [logit_for_pos1, logit_for_pos1, logit_for_pos2]
+    """

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f437674 and da80beb.

📒 Files selected for processing (3)

src/axolotl/integrations/diffusion/generation.py (2 hunks)
src/axolotl/integrations/diffusion/trainer.py (2 hunks)
src/axolotl/integrations/diffusion/utils.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/axolotl/integrations/diffusion/trainer.py (1)

src/axolotl/integrations/diffusion/utils.py (2)

create_bidirectional_attention_mask (125-159)

shift_logits_to_input_positions (162-166)

src/axolotl/integrations/diffusion/generation.py (1)

src/axolotl/integrations/diffusion/utils.py (2)

create_bidirectional_attention_mask (125-159)

shift_logits_to_input_positions (162-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
GitHub Check: PyTest (3.11, 2.8.0)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest (3.11, 2.7.1)

🔇 Additional comments (4)

src/axolotl/integrations/diffusion/trainer.py (2)

14-14: LGTM: Import addition is correct.

The import of shift_logits_to_input_positions from the utils module follows the existing import pattern.

210-210: No issues detected with logits shifting alignment or loss computation.

src/axolotl/integrations/diffusion/generation.py (2)

10-10: LGTM: Import addition is consistent.

The import follows the same pattern as in the trainer module for consistency.

363-363: Logits shifting applied consistently in generation.

The shift is correctly applied before token sampling in the diffusion step, maintaining consistency with the training logic. This ensures the same logits alignment is used during both training and generation.

codecov · 2025-09-29T15:54:56Z

Codecov Report

❌ Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/integrations/diffusion/generation.py	50.00%	1 Missing ⚠️
src/axolotl/integrations/diffusion/utils.py	75.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

NanoCode012

Test fail unrelated: out of space

NanoCode012 · 2025-10-01T08:04:37Z

src/axolotl/integrations/diffusion/utils.py

+    """Align next-token logits with their input token positions for diffusion."""
+    if logits.size(1) <= 1:
+        return logits
+    return torch.cat([logits[:, :1], logits[:, :-1]], dim=1)


What is this trying to do? Concat logit's first column and 1..N column together?

It's a bit of a hack to use pretrained causal LMs for diffusion fine-tuning. we're shifting logits to the right by one position so we align the input logits with the output logits

Unfortunately we're duplicating the first token, but I couldn't think of a better way to do it. open to ideas here

winglian · 2025-10-07T21:00:48Z

just as a data point, I had messed around with an early version of Dan's diffusion trainer a while back and here's the change I made to support next-token prediction cf8c93e. my changes may be unnecessary, but wanted to make sure we didn't miss anything.

djsaunde requested a review from a team September 29, 2025 15:45

djsaunde self-assigned this Sep 29, 2025

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

NanoCode012 reviewed Oct 1, 2025

View reviewed changes

djsaunde added 3 commits October 7, 2025 16:49

shift logits for diffusion generate

efeb524

delete unused

d2c31cf

diffusion trainer: token shift

7f6f08e

winglian force-pushed the diffusion-shift-logits branch from da80beb to 7f6f08e Compare October 7, 2025 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Diffusion trainer fix: shift logits to align with input tokens #3191

Diffusion trainer fix: shift logits to align with input tokens #3191

djsaunde commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Sep 29, 2025

Uh oh!

NanoCode012 left a comment

Uh oh!

NanoCode012 Oct 1, 2025

Uh oh!

djsaunde Oct 2, 2025 •

edited

Loading

Uh oh!

winglian commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Diffusion trainer fix: shift logits to align with input tokens #3191

Are you sure you want to change the base?

Diffusion trainer fix: shift logits to align with input tokens #3191

Conversation

djsaunde commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 29, 2025

Codecov Report

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

djsaunde Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winglian commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djsaunde commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

djsaunde Oct 2, 2025 •

edited

Loading